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Distribution functions, extremal limits and optimal transport 

Dedicated to the 125th anniversary of J.G. van der Corput 

M.R. laco, R.F. Tichy and S. Thonhauser* 


Abstract 

Encouraged by the study of extremal limits for sums of the form 

1 ^ 

lim T7 y'c(a:n,i/n) 

N—^OO 1\ ' ^ 
n=l 

with uniformly distributed sequences {xn}, {Vn} the following extremal problem is of interest 

max / c{x,y)^{dx,dy), 

^ ■'[ 0.112 

for probability measures 7 on the unit square with uniform marginals, i.e., measures whose 
distribution function is a copula. 

The aim of this article is to relate this problem to combinatorial optimization and to the 
theory of optimal transport. Using different characterizations of maximizing 7’s one can give 
alternative proofs of some results from the field of uniform distribution theory and beyond that 
treat additional questions. Finally, some applications to mathematical finance are addressed. 


1 Introduction and motivation 

In a series of papers J.G. van der Corput 071 HE] systematically investigated distribution functions 
of sequences of real numbers. Some of his main results are as follows: 

(i) Any sequence of real numbers has a distribution function. 

(ii) Any everywhere dense sequence of real numbers can be rearranged in such a way that the 
new sequence has an arbitrarily given distribution function. 

Clearly, in general a distribution function is not uniquely determined by the sequence. Further¬ 
more, van der Corput established necessary and sufficient conditions for a set M. of non-decreasing 
functions such that is the set of distribution functions of some sequence of real numbers. 

More recently, the study of distribution functions was extended to multivariate functions by the 
Slovak school of O. Strauch and his coworkers; see m^m l44] . In particular, they studied 
properties of the set of distribution functions of sequences in [0,1]^ and various extremal problems 
related to distribution functions. It should be noted that bi-variate distribution functions are 
well-known in financial mathematics for modeling dependencies in risk processes. 


*The authors are supported by the Austrian Science Fund (FWF) Project F5510 (part of the Special Research 
Program (SFB) “Quasi-Monte Carlo Methods: Theory and Applications”). The first author is also partially 
supported by the Austrian Science Fund (FWF): W1230, Doctoral Program “Discrete Mathematics” 
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From Fialova & Strauch [521 Thm 1 ] one knows that for uniformly distributed sequences {xn}, {Vn} 
in [0,1] and a continuous function c : [0,1]^ —>■ K one has 


lim — 

N^OO N 


N 

y^c{xn,yn)= c{x,y)-f{dx,dy) 

n=l 


( 1 ) 


where 7 is a probability measure on the unit square equipped with the cr—algebra of Borel sets. 
Such a measure exhibits a bi-variate distribution function C : [ 0 , 1 ]^ —[ 0 , 1 ] which is gener¬ 
ally called a copula. The aim of this article is to provide a connection between the problem of 
finding extremal limits in 0 (or maximal and minimal bounds for such limits) by studying the 
optimization problem 

/ c{x,y)j{dx, dy) 1—)■ max (2) 

and the held of optimal transport. Indeed, we will show how this problem can be perfectly em¬ 
bedded in the general theory of optimal transport. 

Motivated by the discussion on the limiting property Q problem ([^ attracted some attention 
in the number theoretic community and found its way on the collection of unsolved problems of 
Uniform Distribution Theory 0 We will mention some existing results in that context below. 
Notice that in the uniform distribution literature, problem 0 is originally written as an opti¬ 
mization with respect to functions C: [0,1]^ —>■ [0,1] satisfying the following properties: for every 
a;, y e [0,1] 


C(x,0)=C'(0,j/)=0, 

C{x, 1 ) = X and C(l, y) = y, 

and for every xi,X2,yi,y2 & [0,1] with X2 > xi and y2 > yi 

C{x2, y2) - C(x2,yi) - C{xi,y2) + C{xi,yi) > 0 . 

Clearly, in this particular situation a copula is a bivariate distribution function on [ 0 , 1 ]^ with 
standard uniform marginals. From the above stated properties one additionally sees that a copula 
C induces a (Borel-)probability measure 7 on [ 0 , 1 ]^, via the formula 

7([a, 6] X [c, d]) = C{b, d) — C{b, c) — C{a, d) + C{a, c). 

A hrst result in the direction of extremal limits Q is given in | 35 j . where the authors take c{x, y) = 
\x — y\ in order to hnd optimal upper and lower bounds on the average distance between consec¬ 
utive points of u.d. sequences. In particular, they proved limAr_>oo ^ \xn+i — Xn\ = 

for the van der Corput sequence (</'b(n))„>o in base b. 

By looking at the same problem, but in the formulation of 0. the authors in [ 52 ] could give an 
explicit formula for the asymptotic distribution function of the sequence {(fbin), (j)i,{ri -I- l))n>o, 
that is of the copula C{x,y). 

The problem of finding the limit distribution of consecutive elements of the van der Corput se¬ 
quence was also considered in [T] , but using a different approach based on ergodic properties of the 
sequence itself. However, this approach does not give an explicit form of the copula C{x, y). This 
last problem is not easy to handle with and, apart from the already mentioned papers | 23 l 135 ] , 
only [22] is known, where the authors found an explicit asymptotic distribution function of the 
sequence { 4 >bin), 4 >b{n + l), 4 >bin + 2))„>o. 

Problem 0 has been recently studied in | 25 j in connection with a well-known problem in combi¬ 
natorial optimization, namely the linear assignment problem. By means of this tool, the authors 

^Problem 1.29 in the open problem collection as of 28. November 2013 
(http: //www.boku.ac.at/MATH/udt/unsolvedproblems.pdf) 
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give optimal upper and lower bounds for integrals of two-dimensional, piecewise constant func¬ 
tions with respect to copulas and construct the copulas for which these bounds are attained. More 
precisely, the copulas realizing these bounds are shuffles of M, where the permutation a is the one 
which solves the assignment problem. 

From the uniform distribution point of view, this class of copulas represents a family of uniform 
distribution preserving mappings (u.d.p.), i.e. maps / generating u.d. sequences {f{xn))nen for 
every u.d. sequence We will discuss more extensively the linear assignment problem and 

the algorithm that solves it in one of the following next sections. 

The problem (§ is known as Monge-Kantorovich transport problem. Its origin is the ques¬ 
tion of how to transport soil from one location to another at minimal costs. More precisely, 
suppose these two locations are disjoint subsets M and F of the Euclidean plane and that 
c{x,y): X — >• [0,oo) is the cost of transporting one shipment of soil from x to y. For sim¬ 
plicity, we assume that there is no splitting of shipments. Thus, a transport map is a function 
T: M —>■ F. The goal is to find the optimal transport map which minimizes the total costs 

c(r) := c{m,T(m)), 

m^M 

under the restriction that all the soil needs to be moved. We will give a precise description of the 
optimal transport problem and fundamental results in section 

The Monge-Kantorovich problem has found a great variety of applications in pure and applied 
mathematics, such as Ricci curvature [29) . nonlinear partial differential equations [12) . gradient 
flows [3], structure of cities m, maximization of profits na, leaf growth [50] and so on. 

As a prominent field of application of the theory of copulas and the transport problem one needs to 
mention financial mathematics. In the last decade copulas, or more precisely some parametrized 
families of copulas, became very popular in finance for modelling dependence structures within 
groups of assets or more generally between different kinds of risk factors. In the past the study of 
dependence structures was typically reduced to the determination of correlation coefficients. How¬ 
ever, correlation coefficients describe dependencies perfectly only in the situation of marginally 
normal distributed risk factors, while distributions obtained from financial market data are typi¬ 
cally not normal. A standard introduction to risk modeling and particularly to practical aspects 
of copula modeling is the book by McNeil et al. [30) . 

It turned out that a precise description of the dependency structure within a portfolio of risks is 
not feasible in practice. That’s why one is trying to determine some (one may call it worst case or 
robust) bounds on risk measures of portfolios. For this purpose it is possible to utilize variants of 
the optimal transport problem: try to minimize a risk measure of a portfolio of several risks with 
respect to their distribution while preserving their marginals. Some recent publications studying 
problems from risk management are Riischendorf SSI, Puccetti & Riischendorf [3^ or Bernard et 
al. m- A paper dealing with model independent bounds on option prices using theory of optimal 
transport is Beiglbock et al. [7]. 

2 Mathematical formulation 

In this section we give precise statements of the mathematical objects at hand. We refer to 
[Ill|2ll|26l[33| for details. Our starting point is Sklar’s Theorem (see e.g. |33l Theorem 3.2.2]), a 
classical result about copulas which provides the theoretical foundation for application of copulas. 

Theorem 2.1 

Given a d-dimensional distribution function (d.f.) H with marginals Fi,..., Fd, there exists a 
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d-copula C such that for all (xi,X 2 , ■ ■ ■, Xd) S 


H{xi,X 2, ...,Xd)= C{Fi{xi),F2{x2), ■ ■ .,Fd{xd)) (3) 

The copula C is uniquely defined on Ran{Fj) and is therefore unique if all the marginals 
are continuous (here Ran{Fj) denotes the range of Fj). 

Conversely, if Fi, F 2 ,..., Fd are d (1-dimensional) d.f’s, then the function H defined through Eq. 
([^ is a d-dimensional d.f. 

Given a d-variate d.f. F, one can derive a copula C. Specifically, when the marginals Fi are 
continuous, C can be obtained by means of the formula 


C{ui,U 2 ,...,Ud) = ^(wi),d ^2 ^{u 2 ),---,F^ ^{ud)) , 

where F~^ = inf{t|Fi(t) > s} is the pseudo-inverse of Fi. 

Thus, copulas are essentially a way for transforming the r.v. (Xi,X 2 ,... ,Xd) into another r.v. 
(Gi, { 72 ,..., Ud) = (Fi(Xi), ^ 2 (^ 2 ),..., Fd{Xd)) having uniform margins on [ 0 , 1 ] and preserving 
the dependence among the components. As we have seen in Section 1, every copula C induces 
a probability measure 7 . Moreover, there is a one-to-one correspondence between copulas and 
doubly stochastic measures. For every copula C, the measure 7 is doubly stochastic in the sense 
that for every Borel set B C [0,1], 7 ([ 0 , 1] x B) = y{B x [0,1]) = X{B) where A is the Lebesgue 
measure on [0,1]. Conversely, for every doubly stochastic measure /i, there exists a copula C given 
by C{u, v) = ^(([0, m]) X ([0, v])). Clearly, a probability measure on ([0,1]^, S([0,1]^)) with uniform 
marginals is doubly stochastic. Therefore, we can translate some measure-theoretic concepts and 
results into the language of copulas. 

In particular, we are interested in the correspondence between copulas C and measure-preserving 
transformations /, g on the unit interval, via the formula 

Cf,g{u,v) = A(/"^[0,u]) n A(g"^[0,'i;]) . 

We refer to m for details and the study of related properties. 

The following theorem shows how every copula can be bounded from above and below. The upper 
and lower bounds are called Frechet-Hoeffding bounds. 

Theorem 2.2 

Suppose Fi,..., Fd are marginal d.f. ’s and F is any joint d.f. with those given marginals, then for 
all X e 

-I-1 - < F(x) < min(Fi(xi),..., Fd(xd)) . (4) 

The right-hand side of Q is always a copula, whereas the left- hand side is a copula only d = 2, 
see [5^ Theorem 3.2 and 3.3]. 

Thus, the problem is to find bounds of the form 



c{x,y)dC^i^{x,y) < 



c{x,y)dC(x,y) < 



c(a:,y)dC'niax(a:,y), 


(5) 


where Cmin, Cmax are copulas. 

A particularly interesting subclass of copulas for our problems are so-called shuffles of M, see [33l 
Section 3.2.3]. They represent a construction principle that generates new copulas by means of a 
suitable rearrangement of the mass distribution of the upper Frechet bound M. 
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Definition 2.3 (Shuffles of M) 

Let n > 1, s = (so,..., s„) be a partition of the unit interval with 0 = sg < si < ... < Sn = 1, 
TT be a permutation of n} and uj: Sn ^ {—1,1}. We define the partition t = 

{tg,... ,tn), Q = tg < ti < ... < tn = 1 such that each [si-i,^ square. A 
copula C is called shufHe of M with parameters {n, s,tt,u}} if it is defined in the following way: 
for all i G {1,... ,n} if uj(i) = 1, then C distributes a mass of Si — Si-i uniformly spread along 
the diagonal of [si-i, [ and if uj{i) = —1 then C distributes a mass of Si — Si-i 

uniformly spread along the antidiagonal of [^ 7 i-(i)-ii ^zr(i)[• 

Note that the two Frechet-Hoeffding bounds W, M are trivial shuffles of M with parameters 
{1, (0,1), (1), —1} and {1, (0,1), (1), 1}, respectively. Furthermore it is well-known that every 
copula can be approximated arbitrarily close with respect to the supremum norm by a shuffle of 
M; see e.g. [331 Theorem 3.2.2]. 

In |20[ Theorem 4] shuffles of M are characterized in terms of measure preserving transformations 
T of [0,1] and the push-forward of the doubly stochastic measure induced by M. More precisely, 
the authors proved the following result. 

Theorem 2.4 

Let 7 c denote the doubly stochastic measure induced by the copula C and 7 m be the doubly 
stochastic measure induced by M. The following statements are equivalent: 

(a) a copula C is a shuffle of M; 

(b) there exists a piecewise continuous measure-preserving permutation such that yc = 
where St '■ [0,1]^ —)■ [0,1]^ is defined as St^u, v) = (T(u), v) for every (u, v) G [0,1]^. 

On the other hand, the general problem is that of determining curves in the unit square which can 
be considered as the support of a copula. In |3I] it has been proven that, for every copula obtained 
as a shuffle of M, there is a piecewise linear function whose graph supports the probability mass. 
In this context, the following general result holds. 

Proposition 2.5 

Let f : [0,1] —>■ [0,1] be a Borel measurable function. Then, there exists a copula C whose 
associated measure 7 has its mass concentrated on the graph of f (with 7 (G(/)) = 1) if, and only 
if, the function f preserves the Lebesgue measure A. 

These results provide an interesting link between the theory of copulas and the theory of uniform 
distribution of sequences of points. In particular, shuffles of M can be considered as special uni¬ 
form distribution preserving (u.d.p.) mappings. Recently, different concepts of convergence for 
copulas have been used; the question arising in this context is about the closure of the class of 
shuffles of M with respect to different notions of convergence and thus topologies. This problem 
has been considered for instance in m- 

Now we consider the connection between copulas and uniformly distributed sequences of points in 
[0,1[. For a detailed account on this concept the interested reader is referred to Drmota & Tichy 

m- 


Definition 2.6 

A sequence (xn)nGN of points in [0,1[ is called uniformly distributed (u.d.) if and only if 

1 ^ 
n—1 

for all intervals [a, 6[C [0,1[, where denotes as usual the indicator function of the set E. 
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We call C the asymptotic distribution function (a.d.f.) of a point sequence in [Oj Ip if 



n—1 


holds in every point (a:, y) of continuity of C. In |23j Fialova and Strauch consider 



where (xn)n>i, (yn)n>i are u.d. sequences in the unit interval and / is a continuous function on 
[0,1]^. In this case the a.d.f. g of {xn,yn)n>i is always a copula and we can write 



The following theorem proved in Tichy & Winkler [45] is of particular relevance for us in order to 
show the connection between uniform distribution preserving maps and approximations by means 
of shuffles of M |31) . 

Theorem 2.7 

The set of all continuous piecewise linear u.d.p. mappings are dense in the set of all continuous 
u.d.p. mappings with respect to uniform convergence. 

3 Theory of optimal transport 

In this section we briefly state fundamental results from the theory of the Monge-Kantorovich 
optimal transport problem. The presented results on existence of optimizers and the dual prob¬ 
lem formulation are stated in an adequate depth such that the number theoretic questions under 
consideration are covered. Comprehensive accounts on the subject are Villani [33] or Rachev & 
Riischendorf [37] for example, a set of lecture notes on the topic would be Ambrosio & Gigli |2]. 
For the basic presentation of the optimal transport problem we are following the above mentioned 
references. 


3.1 Problem formulation 

Let X and Y be Polish spaces and denote 'P(Ar) the set of all Borel probability measures on X 
('P(T) on Y). For a Borel-measurable map T : X ^ Y and y e 'P(X), the measure T^y e 'P(T) 
defined by 


T#y{E) = y{T-\E)) tor E G B{Y) 


is called the push forward of y through T. Together with a Borel measurable cost function 
c : If X y —>■ M U {-koo} we can give: 

Monge formulation: Let y G 'P{X) and ly G VfY) and minimize 

f c{x,T{x))y{dx) (7) 

J X 

among all transport maps T from y to v (all maps for which T^y = v). The transport map T has 
the meaning that a unit mass is put from the point x to the point y = T{x) and at the same time 
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the distribution v is achieved. 

The relaxation of Monge’s optimal transport problem is the following: 

Kantorovich formulation : Let /r € 'P{X) and v S V{Y) and minimize 

/ c{x,y)-i{dx,dy) ( 8 ) 

JxxY 

in the set of ADM{iJ,, v) of all transport plans 7 € 'P{X x Y) from /i to v, i.e., the set of all Borel 
probability measures on X xY, such that 

j{A xY)= p{A) yA e B{X), 

-f{x X B) = v{B) yB e B{Y). 


Now the meaning of a transport plan is that 7 (A x i3), for A G B(X) and B G B(Y), denotes the 
mass initially placed in A which is moved into B, in contrast to transport maps a unit mass can 
now be split. One can immediately answer the question of existence of an optimizer. 


Theorem 3.1 (Th. 1.5 from [2]) 

Assume c is lower semicontinuous, then there exists a minimizer for problem 


i8) 


The following theorem is an excerpt from Theorem 5.10 of Villani [49], it will give us the right 
tools for the construction of optimal solutions for some particular examples. But at first we need 
some notions from [SZIIIS]. 


Definition 3.2 (c-cyclical monotonicity) 

A set r C X X Y is c-cyclically monotone if for all {xn,yn) S T with 1 < n < N for N G N, 
xj\r+i = Xi, it holds that 

N N 

^ ^ ojXn 7 yn) — ^ yn) ■ 

n—1 n—1 


Definition 3.3 (c-convexity) 

Let X, Y be sets and c : X x Y —> K U {+ 00 }, a function / : X —>■ K U {+ 00 } is c-convex if it is 
not identically +00 and there exists a : Y —M U {± 00 } such that 

f{x) = sup[a(j/) — c{x, y)] for all x G X. 

yeY 

Its c-transform is defined by 

f{y) = inO/(a;) + c{x, y)] for all y GY, 

x£X 


and its c-subdifferential is the set 

Scf = {{x,y) G X xY\ f^{y) - f{x) = c{x, y)}, 


or at given x G X 

^cfix) = {yGY\ f(y) - f{x) = c{x, y)}. 
Now everything is clarified such that we can state the theorem. 
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Theorem 3.4 (Th. 5.10 ii) |49p 

Let X and Y be Polish spaces, p G P(X) and v G ViY). Let the cost function c : X x Y -G 
K U {+00} be lower semicontinuous such that c{x, y) > a{x) + b{y) for all (x,y) G X xY for some 
real-valued upper semicontinuous functions a G L^(p) and b G L^(y). Furthermore assume that 
is hnite, then there is a measurable c-cyclically monotone set F C X x T such that for any 
7 G ADM(p, v) the following statements are equivalent: 

(a) j is optimal, 

(b) y is concentrated on a c-cyclically monotone set, 

(c) there is a c-convex function f such that, y-a.s. /'^(y) — f{x) = c{x,y), 

(d) j is concentrated on F. 

Another variant - or better formulation - of problem (|^ is through a coupling of random variables. 
This formulation is of more probabilistic nature and perfectly suits applications from mathemat¬ 
ical finance. 


Coupling formulation: At first the ‘inf’ from (|^ is turned into a ‘sup’, according to a stan¬ 
dard presentation from Riischendorf [33] . Secondly instead of transport plan the notion coupling 
is used, i.e. coupling of two random variables with fixed marginal distributions. Then one may 
write problem Q as, cf.[331 Sec. 4.2], 

sup{E(c(Xi, X2)) I Ai, X2COuplings of^, withP^i = ^andPxa = (9) 

The supremum is taken over the set of all bivariate X x T-valued random variables (Xi, X2) with 
given marginal distributions p and v. Consequently, 7 G ADM {p, v) corresponds to the bivariate 
distribution of (Xi,X2). 


For translating Theorem |3.4| to this maximization situation one needs to adapt the c-convexity no¬ 
tion. Now a function / : X —)■ M is called c-convex if it has a representation f{x) = supj^{c(a:, y) -I- 
a{y)'\, for some function a. The c-subdifferential of / at x is directly (hiding the c-transform) 
given by 

dcf{x) = {y I f{z) - /(x) > c{z, y) - c(x, y) Vz e X} 
and dcf = {{x,y) G X xY\y G dj{x)}. 

Under the assumptions of Theorem |3.4| for X, Y and lower semicontinuity of c one gets an analo¬ 
gous statement. 


Theorem 3.5 (Th. 4.7 from |39| f 

Let c be such that c{x,y) > a{x) -\- b{y) for some a G L^(p), b G L^{v)) and assume hniteness of 

1^. Then a pair (Xi, X2) with Xi ^ p, X2 ~ ix is an optimal c—coupling between p and v if and 
only if 

(Xi,X2) e dcf a.s. 

for some c-convex function f, equivalently, X2 G ScfiXi) a.s. 


Uniform Marginals 

In the following sections we will apply Theorem 3.4 to problems in the spirit of |23j and in 
particular to problem ([^. Therefore in these applications we choose X = Y = [0,1], fix the 
marginal distributions to be uniform p = v = Ulfi, 1] and choose the cost function c : [0,1]^ -G K 
to be continuous. For this setting Theorem |3.4| applies and Q is clearly finite. In this setting the 
Monge problem is an optimization problem with respect to uniform distribution preserving maps. 





Remark 3.1 

In general the question if an optimal transport plan in the Kantorovich formulation (8) is induced 
by an optimal transport map from the Monge formulation Q is not answered in the corresponding 
literature. But there are some particular situations which allow for afhrmative results. For instance 
for Borel measures p, v on c(x, y) = h{x — y) for a strictly convex function h and p absolutely 
continuous with respect to Lebesgue measure there exists an unique map s such that the measure 
7 = (id X s)^p is optimal. For the situation of one dimensional marginals several examples of a 
similar structure are given in Uckelmann j46| and Riischendorf & Uckelmann nn. 

Theorem 3.4 or its complete formulation which is Theorem 5.10 from Villani )4.9| heavily rely 


on a dual problem formulation. How far the duality relation can be exploited, i.e., up to which 
parameter configuration the value of the primal solution equals the value of dual solution, is 
recently discussed in Beiglbock & Schachermayer m and Beiglbock et al. m 


3.2 


Application of Theorem 


3.4 


Now we are ready to give two applications of Theorem |3.4| The hrst one, dealing with a result 
from |23] , concerns a direct verification of optimality via the c-cyclical monotonicity property. The 
second one treats an open problem from [43] by constructing explicitly a c-convex function. 


We start by studying Theorem 5. from Fialova & Strauch |23| from the transport point of view. 


Theorem 3.6 im) 

Let c : [0,1]^ —>■ K be a Riemann integrable function with > 0 for all {x, y) € (0,1)^. Then 


max 
7 - copula Jo Jo 


mm 

7 - copula Jo Jo 


n c{x,y)y{dx,dy)= c{x,x)dx, 

Jo 

n c{x,y)y{dx,dy)= c{x,l-x)dx, 

Jo 


where the maximum is attained in 7“ = min{a;, y} and the minimum in y\x, y) = max{xFy—l, 0}, 
uniquely. 


Remark 3.2 

The maximum and the minimum are attained at the co-called upper and lower Frechet bounds. 
Now by means of the c-cyclical monotonicity criterion for optimality we can give an alternative 
proof of the Theorem above. The procedure is similar to the one used in the proof of Proposition 
1. from Rochet j38f in a slightly different context. Rochet proves that if the derivative condition 
is fulfilled the support of a transport plan r(a:) is c-cyclically monotone if and only if r(-) is non¬ 
decreasing. Combining this with a result on uniform distribution preserving maps from Tichy & 
Winkler J45| one arrives at Fialova <fe Strauch’s result that r(a:) = x is the maximizing transport 
plan and r(x) = 1 — x is the minimizing one. 

Notice Theorem 3.1.2. from ms includes the situations of Theorem 5. and 6. from Fialova Sz 
Strauch j23f . 

Proof: We need to show that for iV G N and xq, xi, ..., xn 

N 

^ ^ c(Xyj+l, yn) o{Xyi, yn) — O7 
n—0 

where xx+i = xq and (xn,yn) are in the support of the to be shown optimal measure. In our 
particular situation for the upper bound y„ = r(x„) = x„. We start with a cycle of length 2 
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{N = 1) and points xq, xi and assume w.l.o.g xi > xq. From 

rxi rxt 

c{xo,xi) - c{xi,xi) + c{xi,xo) - c(xo,xo) = - / -7r^{^^y)dydx <Q 

J xq J xq (Jxuy 

the statement follows. Now assume that for all cycles of length N the statement holds true, 

N-l 

^ ^ c(x7T,-(_i, Xtt,) ^ 0, 

n—0 

and choose arbitrary xq, xi,... ,xn (again w.l.o.g. xn = inax„g{o.i,...,7V} Xn)- Fix a sub-cycle of 
length iV — 1 with = Xji for 0 < n < — 1 and x'j^ = Xq. Then using 

Af-l 

^■= < 0, 

n—0 

we get {xn +1 = x'j^ = a;o), 


N 


c{a;„+i, a;„) - c(x„, Xn) =A + [c{xn, xn-i) - c{xn-i,xn-i)] - [c(a;o, xn-i) - c{xn-i,xn-i) 


n—0 


+ [c{xn+1,Xn) - c{xn,xn)] 

<c(xn,Xn-i) - c{xo,xn-i) + c{xo,Xn) - c{xn,Xn) 

rXN rXN q2^. 


dxdy 


{x, y)dydx < 0, 


which completes the proof. 


□ 


The sine example 

Now we can turn our view on a particular result from Uckelmann [46) . see Riischendorf & Uck- 
elmann m as well, which somehow perfectly matches the sine question. It is an open problem 
from [?3]) which could not be treated by direct analytical methods. 

For dealing with this example one may utilize a modification of Theorem 1. from |46j . We re-state 
the following version of this result and sketch its proof. 

Remark 3.3 

In this particular setting it is possible to construct explicitly an c-convex function f such that 
Theorem |3.5| can be used to deduce the optimal transport plan, i.e. optimal coupling. Notice, see 
|39f . that y G dcf{x) if and only if 3 a(= a(y)) G K such that 

4’yAx)=cix,y)+a{y) = f{x) and V’y.a(C = c(^,y)+«(?/)</(C) (10) 


Theorem 3.7 

Let p, V be the uniform distribution on [0,1] and the cost function c(x,y) = (j){x + y) with 
(p : [0,2] —)■ K. In particular we assume that (p G C^[0,1] and that there is k G (0,2) such that 
p"{x) < 0 for X G [0, k) and p"{x) > 0 for x G (k, 2], If 13 G (0,1) denotes the solution to 

P(2/3)-P(/3)=fip'A), 


then 


r(x) 


/3-x, X G [0,/3), 

X, X G [P,l], 


induces by {U, r({7)) for some standard uniformly distributed U an optimal c-coupling between P 
and Q. 
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Proof: For the proof we proceed as proposed in [15] and [IT]. Define the following functions: 


Furthermore set 


and put for f G [0,1]: 


fi{x) = Xff'iP), 

f2{x) = ^(<^(2x) - (/)(2/3)) + /30'(/3), 

- X + f) + xcf'i/S) - (t)(l3), 

= Hx + 0- ^H^x) - ^(/'( 2 / 3 ) + 

fix) = fiix)I[0,/3){x) + /2(a;)/[/3,i](x), 



xG[ 0,/3), 
xe[/3,1]. 


Here il’r{x)iO pl^-ys the role of tpy,aiO = c(^,y) + a{y) with y = r(x) in (10). Now the idea, 
following Theorem 3.5 and (10), is to show that y = r(x) is in the c-subdifferential of f{x) for all 
X € [0,1] which implies optimality of this particular coupling and optimality of the distribution 
induced by (U,r(U)) for the transport problem. For the c-convexity of / and the subdifferential 
property we need to show: 


'(Pr(x)ix) = fix) VxG[0, 1], 

M^)iO<fiO veG[o,i]. 

We start with showing that tpr{x)ix) = fix). For x G [0,/3) we have that r(x) = P — x and 

f’v(x)ix) = if^ix) = xp'iP) = fi{x) = f{x). 

For X G [/3,1] we have r(x) = x and 

V'r(x)(x) = 'if'^ix) = ]^i4>i2x) - (^(2/3)) + Pp'iP) = /2(x) = /(x). 

It remains to show V'r(a:)(0 — fi€) (x,^) G [0, 1] x [0, 1], which can be achieved by a rather 

lengthy and carefull analysis, for the details see [I]. □ 


Remark 3.4 

If P > 1 then it can be shown as in the first step of the above proof that {U, 1 — U) yields 
the optimal coupling. Loosely speaking one could say that the concave behaviour dominates the 
convex one. 

Now we are prepared to answer the sine question. Setting (/)(z) = sin(7rz) and k = 1 we immediately 
get: 

Corollary 3.8 

For c(x, y) = sin(7r(x + y)) we have that the distribution of the vector {U, r([/)) for U ~ Z^([0,1]) 
with 

I X, xG[/3,1], 

and P = 0.7541996008265638 « 0.7542 which solves 

sin(27r/3) — sin(7r,d) = /Jtt cos(7r/3), (11) 
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IS maximizing 


'[ 0 . 1 ]" 


smiTr(x + y))d'y{x,y) 


in the set of all bivariate distributions 7 with uniform marginals, i.e., in the set of all copulas. 


Remark 3.5 

In this situation equation {11) meets the first order condition when looking at couplings of the 
form (C/,r“(C7)) with 


X G [0, a), 
X G [a, 1], 


H{a) := / c{xja — x) dx / c{x^x)dx. 

Jo Ja 

4 Approximations 

In this section we are going to introduce some implementable approximation methods for the op¬ 
timal transport problem. Since the computational methods are based on the assignment problem, 
see Burkard et al. m, we recapitulate it and mention (some) one of the fundamental numerical 
solution algorithm(s). Some connections between the particular copula maximization problem and 
the assignment problem are already given in [25]. There the authors showed that for a piecewise 
constant cost function the maximizing measure is induced by a shuffle of M whose parameters are 
linked to the permutation which solves the corresponding assignment problem. 

The (linear sum) assignment problem from combinatorial optimization is given by a matrix 
(cij)i<i,i<n with entries Cij G M which represent costs when assigning j to i, or when trans¬ 
porting 1 unit of mass from i to j. The goal is to match each row to a different column at minimal 
cost, 


r“(a.) = 


a — X, 
X, 


or explicitly maximizing (c(x, y) = sin(7r(a; y)) 


EE CijXij —>■ minimum 

i=i j=i 

under the constraints = 1 for i G n}, — 1 ^or j G n} and 

Xij G {0,1} for all i, j G {1,... ,n}. An interesting remark is given in [T31 p. 75], it states that 
the problem above is equivalent to its continuous relaxation with Xij > 0 for alH, j G {1,..., n}. 

Notice that if specifying p = ^ ^Xi, v = ^ for points {xi,..., x„}, {yi,..., yn} 

in the unit interval and identifying c{xi,yj) = Cij one recovers exactly the assignment problem 
from the original transport problem. This connection also represents the first step in the proof of 
the fundamental Theorem 5.20 from [49]. 

Remark 4.1 

As mentioned above a standard reference, from the theoretical as well as from the algorithmic point 
of view, is Burkard et al. m- Another reference for so-called quadratic assignment problems, 
incorporating a different cost structure, is Qela m In the paper by Beiglbock et al. m the 
general duality theory of the transport problem is motivated by an explicit study of the assignment 
problem. 
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The assignment problem was among the first linear programming problems to be studied exten¬ 
sively. Given n workers and n jobs, we know, for every job, the salary that should be paid to each 
worker for him to perform the job. The goal is to find the the best assignment, i.e. each worker 
is assigned to exactly one job and vice versa in order to minimize the total cost (the sum of the 
salaries). 

One of the many algorithms which solves the linear assignment problem is due to Kuhn [55] and 
Munkres |35| (also known as the Hungarian method). 

In fact the algorithm was developed and published by Kuhn [28], who gave the name “Hungarian 
method”. Munkres |32| reviewed the algorithm and observed that it is strongly polynomial. 

The problem is formulated as follows: given n workers and tasks, and an n x n matrix containing 
the cost of assigning each worker to a task, find the cost minimizing assignment. 

First the problem is written in the following matrix form 


( ail 

012 

^ln\ 

021 

022 


Vonl 

On2 

^nnJ 


where the dij's denote the penalties incurred when worker i performs task j. 

The first step of the algorithm consists in subtracting the lowest a^j ’s of the Hh row from each 
element in that row. This will lead to at least one zero in that row. This procedure is repeated 
for all rows. We now have a matrix with at least one zero per row. We repeat this procedure for 
all columns (i.e. we subtract the minimum element in each column from all the elements in that 
column). 

Then, we draw lines through the rows and columns so that all the zero entries of the matrix are 
covered and the minimum number of such lines is used. Finally, we check if an optimal assignment 
is possible. If the minimum number of covering lines is exactly n, an optimal assignment of zeros 
is possible and we are finished. Otherwise, if the minimum number of covering lines is less than 
n, an optimal assignment of zeros is not yet possible and we determine the smallest entry not 
covered by any line. We again subtract this entry from each uncovered row, and then add it to 
each covered column and we perform again the optimality test. 

Due to its several applications and to the possible connections with related problems, many re¬ 
searchers got interested in the higher dimensional version of the linear assignment problem, the 
MAP, where one aims to find tuples of elements from given sets, such that the total cost of the 
tuples is minimal. While the linear assignment problem is solvable in polynomial time, the MAP 
is NP-hard (see e.g. [I3]). Recently, a new approach based on the Cross-Entropy (CE) methods 
has been developed in [M] for solving the MAP. The efficiency of this method is corroborated by 
several teCsts on large-scale problems. 

4.1 Theoretical basis 

For computational purposes the following fairly general result due to Schachermayer & Teichmann 
[15] is valuable. 

Theorem 4.1 (Th. 3 from [42j I 

Let c : X X Y ^ K>o be a finitely valued, continuous cost function on Polish spaces X, Y. Let 
{'^n}n>o be an approximating sequence of optimizers associated to weakly converging sequences 
hn ^ h dnd > z/ as n —)■ oo, f.e., 7r„ being an optimizer for the transport problem (c, /i„, iz„). 
Then there is a subsequence {7r„^}fc>o converging weakly to a transport plan t: on X x Y, which 
optimizes the Monge-Kantorovich problem for {p,v,c). Any other converging subsequence of 
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{'^n}n>o aiso converges to an optimizer of the Monge-Kantorovich problem, i.e., the non-empty 
set of adherence points of {TTn}n>o is a set of optimizers. 

An approximation result suited for the uniform marginals situation is Theorem 2.2 from |25j . here 
we can complement it by proving that a (sub-)sequence of the discrete optimizers converges to an 
optimizer of the limiting continuous problem. 


Theorem 4.2 

Let c be a continuous function on [0,1]^, let the sets 7”^ be given as 


7” = 

* j 


i — 1 i 


j - 1 j ’ 

2" ’ 2” 


2" ’ 2" 


for*,j = l,...,2" 


for every n > 1 and define the functions c^, c„ as 


Cn(a;, y) = , min c {x, y ), for all {x, y) S 7" , 

Cn{x,y)= max c{x,y), for all {x,y) & 11,. (12) 

Furthermore, let 7" , 7max t>e maximizing measures for cost functions c„ and Cn respectively. 

Then 


lim / 

n-foo 7[0,1[2 


Cn{x,y)i^^Jydx,dy) 


= lim 

n—>oo 


Cnix,yh^^^idx,dy) 

[o,ip 

= sup/ c{x,y)j{dx,dy). 

76C 7 [o,ip 


(13) 


Furthermore the sequence of maximizers converges, at least along some subsequence, to a maxi¬ 
mizer of the original problem /jp .^2 cy{dx, dy). 

Proof: The first statement is already shown in [25) . For the remaining part we can proceed 
in the spirit of Villani’s proof of Theorem 5.20 from |49j . notice there a sequence of continuous 
functions c„ is considered. We will show the proof for the lower approximation via c„. 

At first observe that 7" converges (at least along some subsequence) to some measure 7* with 
uniform marginals, cf. [23 Thm. 5.21]. 

From above we know that 7" is concentrated on a c„-cyclically monotone set with the consequence 
that for some N gN the A^-fold product measure 7"’®-'^ is concentrated on the set Sn{N) of points 
{xi,yi),..., {xN,yN) for which 


N N 

Cn {Xj ,yj)>Y^^ +1 ’ ’ 

i=i 

with a;iv+i = xi. Now fix some e > 0 and choose n large enough, such that 7"’®^ is concentrated 
on the set Ss{N) of points with 

N N 

1=1 1=1 


Since c is continuous we have that S^{N) is a closed set. This (using indicator functions in the 
weak convergence characterization) implies that also the limiting measure 7*’®-^ is concentrated 
on Se{N) for all £ > 0. We can let £ —?► 0 and derive that 7*’®-'^ is concentrated on a set of points 
with 

N N 

Y ^(^1 ^yj)^Y ’ y ^) 

1=1 1=1 
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and therefore is concentrated on a c-cyclically monotone set. 
deduce from Theorem 3.4 'b) that 7 * is optimal. 


Since the costs are bounded we 

□ 


Remark 

Theorem 
be i.i.d. 


4.: 

4.1 


can be used when approximating p, and v by empirical distributions. Let Xi, X 2 , ■ ■ ■ 
random variables with distribution p and Yi, Y 2 ,... be i.i.d. random variables with 

and 


distribution v. Then the empirical distributions defined by /t„(—oo,a;] = ^{Xk<x} 

Uni-00, X] = yYTk= -1 ^{Yk<x} converge weakly to p and v, see Prop. 4.24]. In this situation 
one can solve assignment problems along realizations of the random sequences {^fc} and {Yk}. 
The spirit of Theorem 4.2 is a little different. There the marginal distributions are fixed to 
be uniform, whereas the cost function is approximated by piecewise constant functions on a 
deterministically chosen grid. This particular situation is linked to a solution of the assignment 
problem via Theorem 2.1 of J 


4.2 Numerical examples 


An explicit implementation of the Hungarian algorithm applied to the cost function c(x, y) = 
sin( 7 r(x + y)) can be found in |25j . The authors provide also a numerical solution to a problem in 
financial mathematics, namely the First-to-Default (FTD) swap. This is a contract between an 
insurance buyer and an insurance seller. The first one makes periodic premium payments, called 
spreads, until the maturity of the contract or the default, whichever occurs first. In exchange 
the second one compensates the loss caused by the default at the time of default. In [25] an 
approximation of the value of the maximal spread is provided. 

Of course other applications are possible and interesting, but this goes beyond the purposes of the 
present paper. 

Our aim is to consider some cost functions c involving the sine function as in |25j and to show how 
the support of the copula where the maximum is attained can vary considerably. More precisely, 
we consider 


1 


N 


limsup ^ ^ c(x„ +yn) , 

n—1 


N—^oo hf 


(14) 


with c{x,y) = sin(27rx) sin(27r2/), c{x,y) = sin(27rx) cos(27r2/) and c{x,y) = sin(27r/x) cos(27r?/). In 
particular, the numerical costs for the transport map in the first case is 0.5, which is the same 
as when using the identity map (explicitly computed). Let us remark that obviously there is no 
unique solution. 

The numerical results are illustrated in Table [TJ 


c 

n 

sin(7rx) sin(7r?/) 

sin(7rx) cos(7ry) 

sin(7r/x) cos(7rj/) 

2 

0.5 

0.1768 

0.4612 

3 

0.5 

0.2039 

0.3402 

4 

0.5 

0.2102 

0.5067 

5 

0.5 

0.2117 

0.4012 

6 

0.5 

0.2121 

0.4580 

7 

0.5 

0.2122 

0.4400 


Table 1: Upper bounds for the limsup in (14|. 
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Figure 1: Support of copula which attains upper bound for c{x, y) = sin(7ra;) sin(7rj/) and n = 10 



Figure 2: Support of copula which attains upper bound for c(x, y) = sin(7rx) cos(7ry) and n = 10 



Figure 3: Support of copula which attains upper bound for c(x,y) = sin(7r/a;) cos(7ry) and n = 10 
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